Generating Vector Code for Matrix-matrix Multiplication

نویسندگان

  • Joohoon Lee
  • Dongkeun Lee
چکیده

The current state of the art Matrix-Matrix-Multiplication (MMM) kernel is known as ATLAS, which generates the best performing MMM code by search. However, today’s computer architecture changes rapidly and it is hard to generate a high performance code without knowing how to use the new instruction sets. Since ATLAS does not make use of blocking for L2 cache, or SSE/SSE2 instruction, we are encouraged to improve ATLAS to obtain higher MMM performance than that of the original ATLAS. Our experiment result shows that we can obtain high performance using SSE/SSE2 which is available on the new generations of Pentium.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication (Submitted for publication)

Run-time specialization is the process of generating programs based on information available only at run time. This technique has the potential of generating highly efficient codes, at the expense of the overheads of the run-time code generation. It is applicable when some input data is used repeatedly while other input data varies. In this paper we explore the potential for obtaining speedups ...

متن کامل

Optimization of Sparse Matrix-Vector Multiplication by Specialization

Program specialization is the process of generating optimized programs based on available inputs. It is particularly applicable when some input data are used repeatedly while other input data vary. Specialization can be employed at compile-time as well as at run-time, depending on when the inputs become available. In this paper we explore the potential for obtaining speed-ups for sparse matrix-...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Design of Logic Network for Generating Sequency Ordered Hadamard Matrix H

A logic network to produce the sequency ordered Hadamard matrix H based on the property of gray code and orthogonal group codes is developed. The network uses a counter to generate Rademacher function such that the output of H will be in sequency. A general purpose shift register with output logic is used to establish a sequence of period P corresponding to a given value of order m of the Hadam...

متن کامل

Generating Optimized Sparse Matrix Vector Product over Finite Fields

Sparse Matrix Vector multiplication (SpMV) is one of the most important operation for exact sparse linear algebra. A lot of research has been done by the numerical community to provide efficient sparse matrix formats. However, when computing over finite fields, one need to deal with multi-precision values and more complex operations. In order to provide highly efficient SpMV kernel over finite ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005